-
Notifications
You must be signed in to change notification settings - Fork 256
Feat: Adding a CNI Telemetry sidecar to CNS pod to replace the azure-vnet-telemetry binary #3824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes the CNI telemetry service initialization issue where it was failing to start due to an empty Application Insights instrumentation key. The fix ensures that the telemetry service uses the proper AI configuration from the CNS configuration instead of an empty basic configuration.
- Updates the
startTelemetryService
function to accept CNS configuration and properly initialize AI telemetry settings - Adds validation to check for Application Insights instrumentation key before starting the telemetry service
- Introduces proper telemetry configuration mapping from CNS settings to AI config
cns/service/main.go
Outdated
@@ -94,6 +94,7 @@ const ( | |||
// Service name. | |||
name = "azure-cns" | |||
pluginName = "azure-vnet" | |||
aiPluginName = "AzureCNI" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a typo in the PR title 'Telemtry' which should be 'Telemetry'. While this constant name is correct, ensure consistency in naming throughout the codebase.
Copilot uses AI. Check for mistakes.
cns/service/main.go
Outdated
if cnsconfig.TelemetrySettings.AppInsightsInstrumentationKey != "" { | ||
err = tb.CreateAITelemetryHandle(aiConfig, ts.DisableTrace, ts.DisableMetric, ts.DisableEvent) | ||
} else { | ||
logger.Printf("No Application Insights key provided for CNI telemetry service") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should check other way and remove else?
if cnsconfig.TelemetrySettings.AppInsightsInstrumentationKey == "" {
logger.Printf("No Application Insights key provided for CNI telemetry service")
return
}
err = tb.CreateAITelemetryHandle(aiConfig, ts.DisableTrace, ts.DisableMetric, ts.DisableEvent)
Discussed in the team meeting today - we may want to consider doing telemetry for CNI directly in CNI rather than CNS, to account for scenarios where CNI can't talk to CNS. |
2389f47
to
df332ec
Compare
dde3441
to
b59b87b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on my understanding, this pr replaces azure-vnet-telemetry binary on the node with a sidecar that does the same thing. So CNI would send AI events to this sidecar via a socket if it is present (rather than the azure vnet telemetry long running process). I believe right now azure-vnet-telemetry is still on the vhd (?) so CNI will still attempt to start that azure vnet telemetry binary service. Can you check that there's no race or issue with both CNS and Azure CNI trying to start the telemetry service at the same time?
Makefile
Outdated
CNI_TELEMETRY_SIDECAR_DIR = $(REPO_ROOT)/cns/cni-telemetry-sidecar | ||
CNI_TELEMETRY_SIDECAR_BUILD_DIR = $(BUILD_DIR)/cni-telemetry-sidecar | ||
CNI_TELEMETRY_SIDECAR_AI_ID = $(CNI_AI_ID) # Reuse CNI AI ID | ||
CNI_TELEMETRY_SIDECAR_VERSION = $(CNS_VERSION) # Version follows CNS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move the version to where all the other version variables are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also if you're intending this to be like azure ip masq merger with its own release versions etc. you'll need to add the exclusion to the tag above: ACN_VERSION ?= $(shell git describe --exclude "azure-ip-masq-merger*" --exclude "azure-ipam*" --exclude "dropgz*" --exclude "zapai*" --exclude "ipv6-hp-bpf*" --tags --always)
Makefile
Outdated
@@ -123,7 +128,7 @@ all-binaries-platforms: ## Make all platform binaries | |||
|
|||
# OS specific binaries/images | |||
ifeq ($(GOOS),linux) | |||
all-binaries: acncli azure-cni-plugin azure-cns azure-npm azure-ipam azure-ip-masq-merger ipv6-hp-bpf | |||
all-binaries: acncli azure-cni-plugin azure-cns azure-npm azure-ipam azure-ip-masq-merger ipv6-hp-bpf cni-telemetry-sidecar | |||
all-images: npm-image cns-image cni-manager-image azure-ip-masq-merger-image ipv6-hp-bpf-image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like you're making a new image-- if this is like a new sidecar image/container image, there are other dockerfiles/scripts you may need to edit (ex: for the signed pipeline). would the image target be added here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per @rbtr request I moved it to azure-cni image. I removed all the Makefile changes as a result.
Makefile
Outdated
@@ -275,6 +284,7 @@ CNI_IMAGE = azure-cni | |||
CNS_IMAGE = azure-cns | |||
NPM_IMAGE = azure-npm | |||
AZURE_IP_MASQ_MERGER_IMAGE = azure-ip-masq-merger | |||
CNI_TELEMETRY_SIDECAR_IMAGE = azure-cni-telemetry-sidecar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if you want to keep the variable naming consistent? it's CNI_TELEMETRY here but the image name is azure-cni-telemetry-sidecar-- is the "sidecar" part of the name necessary (here and elsewhere)? If it's a container image I feel like it's already implied that it'll be running in a sidecar or similar.
Makefile
Outdated
# Create a CNI Telemetry Sidecar archive for the target platform. | ||
.PHONY: cni-telemetry-sidecar-archive | ||
cni-telemetry-sidecar-archive: cni-telemetry-sidecar-binary | ||
ifeq ($(GOOS),linux) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this binary is on windows and linux would we omit this linux check?
|
||
// Validate batch interval for optimal Azure ingestion | ||
if config.TelemetrySettings.TelemetryBatchIntervalInSecs <= 0 { | ||
logger.Printf("Warning: Invalid telemetry batch interval, using default") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does ai under the hood use the default automatically if it's not valid, or should this also modify the value to the default (also applies to above conditional)?
cns/cni-telemetry-sidecar/main.go
Outdated
flag.Parse() | ||
|
||
// Initialize logging for the CNI telemetry sidecar | ||
logger.InitLogger("azure-cns-cni-telemetry-sidecar", 1, 1, "/var/log/azure-cns-telemetry") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the naming be the same-- would it be cns or cni telemetry? Like below the logger prints that it's azure cni telemetry sidecar but the log is to the file azure-cns-telemetry
cns/cni-telemetry-sidecar/sidecar.go
Outdated
const ( | ||
// CNI telemetry constants aligned with azure-vnet-telemetry | ||
cniTelemetryAppName = "azure-vnet-telemetry" | ||
cniTelemetryVersion = "1.0.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do we want to start with 0.0.x release train or 1.0.x?
c0a81ad
to
a949253
Compare
8cf84d3
to
0de7cc1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of a new image, let's package this in azure-cni - it's already cached and even embedded in most CNS daemonsets today
0de7cc1
to
6c70dc7
Compare
68a8df3
to
e505a24
Compare
We are aming to run this on Stateless CNI in specific K8s version. So, it will not run the azure-vnet-telemtry. |
Makefile
Outdated
@@ -84,6 +84,10 @@ ACN_PACKAGE_PATH = github.com/Azure/azure-container-networking | |||
CNI_AI_PATH=$(ACN_PACKAGE_PATH)/telemetry.aiMetadata | |||
CNS_AI_PATH=$(ACN_PACKAGE_PATH)/cns/logger.aiMetadata | |||
NPM_AI_PATH=$(ACN_PACKAGE_PATH)/npm.aiMetadata | |||
CNI_TELEMETRY_SIDECAR_DIR = $(REPO_ROOT)/cns/cni-telemetry-sidecar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have azure-
prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can it be acn-node-telemetry
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tamilmani1989 I removed all the Makefile changes and add the binary to the CNI DOCker instead per @rbtr suggestion.
8bb002c
to
4c8edcf
Compare
4c8edcf
to
e9e0c11
Compare
This pull request is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days |
Pull request closed due to inactivity. |
Adding a CNI Telemetry sidecar to CNS pod to replace the azure-vnet-telemetry binary.
Requirements:
Notes: